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Abstract 

By a classical result of iGrav et al.l d 19751) the g distance between stationary pro- 
cesses is identified with an optimal stationary coupling problem of the corresponding 
stationary measures on the infinite product spaces. This is a modification of the opti- 
mal coupling problem from Monge-Kantorovich theory. In this paper we derive some 
general classes of examples of optimal stationary couplings which allow to calculate 
the g distance in these cases in explicit form. We also extend the g distance to random 
fields and to general nonmetric distance functions and give a construction method for 
optimal stationary c-couplings. Our assumptions need in this case a geometric positive 
curvature condition. 



1 Introduction 



Gray et al. I (fl975l) introduced the g distance between two stationary probability measures 



fi, v on E , where {E, g) is a separa ble, complete m etric space (Polish space). The g 
distance extends Ornstein's d distance dOrnsteinl ( 1973 )) and is applied to the information 
theoretic problem of source coding with a fidelity criterion, when the source statistics are 
incompletely known, g is defined via the following steps. Let g n : E n x E n — > M denote 
the average distance per component on E n 

n—l 

Qn{x, y) := - g(xj, yj), x = (x , ■ . . , x n -i), y = (yo, ■ ■ ■ , 2/n-i)- (1.1) 
i=o 

Let g n denote the corresponding minimal ^i-metric also called Wasserstein distance or 
Kantorovich distance of the restrictions of /i, v on E n , i.e. 

g n (ji,v)=m£^J g n (x,y)dP(x,y) \ j3 G M((i n , ^™)|, (1.2) 

where /j, n , v n are the restrictions of /x, v on E n , i.e. on the coordinates (xq, . . . , an d 
M(fj, n , v n ) is the Frechet class of all measures on E n x E n with marginals /Lt n , v n . Then 
the g distance between /x, v is defined as 

q(H,v) = SWp Q n ({J,,l/). (1.3) 

nGN 
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It is known that q(jj,,i/) — linin^oo g n {p,v) by Fekete's lemma on superadditive se- 
quences. 

g has a natural interpretation as average distance per coordinate between two stationary 
sources in an optimal coupling. In the original Ornstein version g was taken as discrete 
metric on a finite alphabet. T his interpretation is further justified by the basic representation 
result (cp. Gray et alld 19751 Theorem 1)) 



q(ji,v) 



= Qs{n,v):= inf / g(x ,y )dT(x,y) 
reM B o,iv) J 

= M{Eg(X ,Y ) I (X,Y) ~ T e M s (fx,u)}. 



(1.4) 
(1.5) 



Here M a (p, z/) is the set of all jointly stationary (i.e. jointly shift invariant) measures on 
E z x E z with marginals p, v and {X, Y) ~ T means that T is the distribution of (X, Y). 
Thus g(p, v) can be seen as a Monge-Kantorovich problem on E 1 with however a mod- 
ified Frechet class M s (p, v) C M(p, v). ( 11.51 ) states this as an optimal coupling problem 
between jointly stationary processes X, Y with marginals p, v. A pair of jointly stationary 
processes (X, Y) with distribution T G M s (p, v) is called optimal stationary coupling of 
p, v if it solves problem ( 11.51 ), i.e. it minimizes the s tationary coupling distance g s . 
By definition it is obvious (see Gray et all (1 1975 )) that 



Qi(p,v) < g(fi,v) < / g{x ,y )dp (x )du°(y ), 



(1.6) 



the left hand side being the usual minimal l\ -distance (Kantorovich distance) between the 



single components /i° 



As remarked in iGrav et alj (1 19751 Example 2) the main representation result in ( 11.41 ). 
( 11.5b does not use the metric structure of g and g can be replaced by a general cost function 
c on E x E implying then the generalized optimal stationary coupling problem 



c s {p,v) = ini{Ec{X ,Y ) | (X,Y) ~ V e M s (/x,i/)}. 



(1.7) 



Only in few cases information on this optimal coupling problem for g resp. c is given 
in the literature. IGrav et alj ( 11975b determine g for two i.i.d. binary sequences with success 
probabilities pi, p2. They also derive for quadratic cost c(xq, yo) = (xq — yo) 2 upper and 
lower bounds for two stationary Gaussian time series in terms of their spectral densities. 
We do not know of further explicit examples in the literature for the g distance. The aim of 
our paper is to derive optimal couplings and solutions for the g metric resp. the generalized 
c distance. 

The g resp. c distance is particularly adapted to stationary processes. One should note 
that from the general Monge-Kantorovich theory characterizations of optimal couplings 
for some classes of distances c are available and have been determined for time series 
and stochastic processes in some cases. For processes with values in a Hilbert space (like 
the weighted £2 or the weighted L 2 spac e) and for general cost functions c, g eneral criteria 
for op timal couplings have been given in lRiischendorf and Rachev d 19901) and Rtischendorfl 



1991 



). Fo r some examples and extensions to Banach spaces see also lCuesta-Albertos et al 



19931) and iRiischendorl il995l) . Some of these criteria have been further extended to mea 



sures /Lt, v in the Wiener space (W, H, p,) w.r.t. the sq uared distance c(x, y) = \x — y\ 2 H 
by Feyel and Ustiinel (2002, 2004) and lUstiinell (12007b . All these results are also applica- 
ble to stationary measures and characterize optimal couplings between them. But they do 
not respect the special stationary structure as described in the representation result in ( 11.51 ). 
( 11.7b . In the following sections we want to determine optimal stationary couplings between 
stationary processes. 

In Section [2] we consider the optimal stationary coupling of stationary processes on K 
and on K m with respect to squared distance. In Section[3]we give an extension to the case 
of random fields. Finally we consider in Section|4]an extension to general cost functions. 
We interpret an optimal coupling condition by a geometric curvature condition. 
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2 Optimal couplings of stationary processes w.r.t. squared 
distance 



In this section we consider the case where E = M (resp. R m ), SI = E and with squared 
distance c(xo,yo) = (%o — J/o) 2 (resp. \\xq — i/q\\ 2 on R" 1 ). Let L : SI — > SI denote 
the left shift, {Lx) t = £t-i- Then a pair of processes (X,Y) with values in SI x SI is 

jointly stationary when {X, Y) = (LX, LY) (= denotes equality in distribution). A Borel 
measurable map S : O — > SI is called equivariant if 

LoS = SoL. (2.1) 

This notion is borrowed from the corresponding notion in statistics, where it is used in 
connection with statistical group models. The following lemma concerns some elementary 
properties. 

Lemma 2.1 a) AmapS : SI — >• fl is equivariant if and only if St(x) = SotX - ' x) for any 
t, x. 

b) If X is a stationary process and S is equivariant then (X, S(X)) is jointly stationary. 
Proof: 

a) lfLoS = SoL then by induction S = IS o S a L - * for all teZ, and thus S t (x) = 
SoiL-tx). Conversely, if S t (x) = S a (L- f x), then S^x) = S (L- t+1 x) = S t {Lx). 
This implies L(S(x)) = S(Lx). 

b) Since LX has the same law as X, it follows that (LX, L(S{X))) = {LX, S{LX)) = 
(J, S)(LX) = (I, S)(X) = (X, S(X)), I denoting the identity. D 

For X = fx and S : O — > SI the pair (X, S(X)) is called optimal stationary coupling 
if it is an optimal stationary coupling w.r.t. /i and v := /i s = i.e., when i/ is the 

corresponding image (push-forward) measure. 

We first consider the case E = R and SI = R z . To construct a class of optimal stationary 
couplings we define for a convex function / : M™ — > R an equivariant map 5:0^0. 
For x € SI let 

df(x) = {y G R n | f(z) - f(x) > y ■ (z - x), Vz e R} (2.2) 

denote the subgradient of / at x, where a ■ b denotes the standard inner product of vectors 
a and b. By convexity df(x) ^ <j>. Let F(x) = (Fk(x))o<k<n-i be measurable and 
F(x) E df(x), x G R™. The equivariant map S is defined via Lemma [2~T| bv 

n-1 

S (a?) = ^F fc (x_ fc ,...,a:_fe + „_i), S t (x) = S {L-*x), x G SI. (2.3) 

fc=0 

For terminological reasons we write any map of the form (12.3b as 

n-1 

S (aO = ^d fc /(z_ fe ,...,z_ fe+n _i), S t (x) = S (L-*x), x G SI. (2.4) 

fc=0 

In particular for differentiable convex / the subgradient set coincides with the derivative of 

/, 8f(x) = {V/(x)} and d t f(x) = £-J(x). 
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Remark 2.2 a) In information theo ry a map of the form St (x) — F(xt- n +ij ■ ■ ■ ,xt+ n -i) 



is called a sliding block code ( see \Grav et al.\ ( 179751) ). Thus our class of maps S defined 



in ( 12.4b are particular sliding block codes. 



b) \Sei introduced so-called structural gradient models (SGM) for station- 



ary time series, which are defined as {(S^)^Q \ $ £ 0}, where Q is the infinite product 
of the uniform distribution on [0, 1], on [0, l] z , {S$ | $ £ 0} is a parametric family 
of transformations of the form given in ( 12.4b and StQ denotes the pullback measure of 
Q by Stf. It turns out that these models have nice statistical properties, e.g. they allow 
for simple likelihoods and allow the construction of flexible dependencies. The restric- 
tion to func tions of the form (12.4b is well founded by an extended Poincare lemma (see 



Sen A2010b\ Lemma 3)) saying in the case of differentiable f that these functions are 
the only ones with (the usual) symmetry and with an additional stationarity property 
St-i(x) = St(Lx)for x £ R z , which is related to our notion of equivariant mappings. 

c) Even if a map S has a representation of the form (12.41 l, the inverse map 5 _1 does not 
have the same form in general. We give an example. Let X = (X t )tez be a real-valued 
stationary process with a spectral representation X t = f Q e 2mXt M(dX), where M(dA) 
is an L 2 -random measure. Define a process Y = (Yt) by 

Y t = S t (X) := X t + e{X t _ x + X t+1 ), e jt 0. 

This is of the form (12.4b with a function f(xo,Xi) = Xq/4 + exqXi + x\/A which is 
convex if\e\ < 1/2. Under this condition, the map X t— > Y is shown to be invertible as 
follows. The spectral representation ofY is N(dX) := (1 + e(c 2friA + c" 2?riA ))Af (dA). 
Then we have the following inverse representation 

X t = / r^r-T t — tt— iV(dA) ^V^, 

where (6 s ) seZ is defined by {1 + e(e 2,riA + c- 27riA )}- 1 = ^ seZ b s c~ 27riXs . By standard 
complex analysis, the coefficients (b s ) are explicitly obtained: 



z± := 



-1± Vl -4e 2 



e(z + — Z-) ' 2e 

Note that |z+| < 1 and \z—\ > 1 since |2e| < 1. Hence b s ^ for all s £ Z and the 
inverse map S (Y) = b s Y s does not have a representation as in ( 12. 41 ). 

The following theorem implies that the class of equivariant maps defined in (12.4b gives 
a class of examples of optimal stationary couplings between stationary processes. 



Theorem 2.3 (Optimal stationary couplings of stationary processes on M) Let f be a con- 
vex function on K™, let S be the equivariant map defined in ( 12.41 ) and let X be a stationary 
process with law /i. Assume that Xq and dkf(X n ) (k = 0, . . . , n — 1) are in L 2 (P). Then 
(X, S(X)) is an optimal stationary coupling w.r.t. squared distance between [i and fi s , i.e. 

E[(X - S (X)) 2 } = min E[(X - Y ) 2 } = &(/i,^ S ), 

Proof: Fix any T £ M s (/i, By the gluing lemma (see AppendixlAii. we can construct 
a jointly stationary process (X, Y, X) on a common probability space such that X ~ ^, 
Y = S(X) and (X, Y) - T. From the definition of Y = S {X), we have Y £ L 2 (P). 
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Then by the assumption of identical marginals 



A := ±E[(X - Y Q ) 2 - (X - Y ) 2 } 

= E[-X Y + X Y ] 
= E[(X - X )S (X)} 



E 



(X — Xq) y^(d k f)(X- k , • ■ • , -^-fe+n-l) 



k=0 



Using the stationarity assumption on X we get with X n = (Xq, . . .X n -i), X n = 
(X ,...,X n -i)foat 

"n-l 

A = E J2^k-X h )(d k f)(X ,...,X n - 1 ) 

_k=0 

< E[f(X n ) - f(X n )] 
= 0, 

the inequality is a consequence of convexity of /. This implies optimality of (X, Y). We 
note that the last equality uses integrability of f(X n ), which comes from convexity of / 
and the L 2 -assumptions. This completes the proof. □ 

Theorem l2 . 3 1 allows to determine explicit optimal stationary couplings for a large class 
of examples. Note that - at least in principle - the p distance can be calculated in explicit 
form for this class of examples. 

The construction of Theorem 12 . 3 1 can be extended to multivariate stationary sequences 
in the following way. Let (X t )t£Z be a stationary process, X t <G K. m and let / : (K m ) n — >• 
R be a convex function on {W n ) n . Define an equivariant map S : (R m ) z — > (R m ) z by 



S (x) = ^ dkf(X- k , ■ • ■ , X-k+n-l) 



(2.5) 



S t (x) = S (L~ t x), ie!] = (l m ) 2 



where L 1 operates on each component of x and dif is (a representative of) the subgradient 
of / w.r.t. the ^-th component. Thus for differentiable / we obtain 



n-l 



S o(x) = ^2 Vfc/(x_fc, . . . , X-k+n-l) 



(2.6) 



fe=0 



where Vif is the gradient of / w.r.t. the <?-th component. 

The classical result for optim al c ouplings w.r.t. t he squared norm distance on R™ due to 
Ruschendorf and Rachev ( 1990l) and Breniei ( 1991 ) characterizes optimal couplings (Y, Z) 



of distributions P, Q on W n by the condition that 

Z e dh(Y) a.s. 



(2.7) 



for some convex function h. The construction in ( 12.51 ) adapts this result to optimal stationary 
couplings of stationary processes on R m . 

Theorem 2.4 (Optimal stationary couplings of stationary processes on R m ) Let f be 

a convex function on (R m ) n and let S be the equivariant map on fl = (K" 1 ) 2 defined in 
( 12. 5\ . Let X be a stationary process on W 11 with distribution fi and assume that Xq and 
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dkf{X n ), < k < n— 1, are square integrable. Then {X, S{X)) is an optimal stationary 
coupling between p and p s = S#p w.r.t. squared distance, i.e. 

E\\X Q - So(X)\\ 2 2 = M{E\\Y - Z \\l \ {Y,Z) ~ T G M s (p,V S )} = Q s (»,t-L S ). (2.8) 

Proof: The proof is similar to that of Theorem |2.3l For a jointly stationary process (X, Y, X) 

with X ~ p, Y = S(X) and X = X ~ p we have using stationarity and convexity as in 
Theorem l2.3l 

\e(\\X - Y \\ 2 2 -\\Xo- VbHl) = E[-X -Y + Xo- Y ] 

n-1 

= E(X -X )-J2 dkf{X- kl X_ fe+ „_ 1 ) 

fc=0 

n-1 

= E^iXk - X k ) ■ d k f(X , . . . ,X n _x) 

fc=0 

< E{f(X o ,...,X n _ 1 )-f{X o ,...,X n _ 1 ))=0. 

The third equality follows from the stationarity assumption and the inequality follows from 
convexity of /. Thus (12.81 ) follows. □ 



Remark 2.5 Considering the case where p is a stationary probability measure on Mr cor- 
responding to the real stationary process X on R we can introduce the multivariate station- 
ary process Y by Y k = (Xk,Xk+i, ■ ■ ■ ,Xk+m-i) on R m . As consequence ofTheorem \2.4\ 
we obtain explicit optimal coupling results for the strengthened stationary distances rela- 
tive to ( 11.3b , dl.4l ), dl.51 > by comparing finite dimensional distributions 

Q m (p, v) = inf {E\\Y Q - Z \\ 2 | Y a ± p"\ Z a ± v m , 

d d v^-^v 

(Y, Z) jointly stationary, Y = p, Z = z/} 

Thus we can compare and optimally couple not only the one-dimensional marginals in 
a stationary way but can also compare the multivariate marginals in a stationary way. 



3 Optimal stationary couplings of random fields 

In the first part of this section we introduce the g distance defined on a prod uct space in the 



case of countable groups and establish an extension of the lGray et alJ (119751) representation 
result to random fields. In a second step we extend this result to amenable groups on a 
Polish function space. This motivates the consideration of the optimal stationary coupling 
result as in Section[2] 

We consider stationary real random fields on an abstract group G. Section |2] was con- 
cerned with the case of stationary discrete time processes, where G = Z. Interesting ex- 
tensions concern the case of stationary random fields on lattices G = 7L d or the case of 
stationary continuous time stochastic processes with G = R or G = R d . 

Let e be the unit element of G. We consider the product space £1 = E G of a Polish 
space (E, g) (e.g. E = R) equipped with the product topology. Note that fl is not Polish 
in general, but its marginal sets E F on a finite or countable subset F C G are Polish. The 
(left) group action of G on is defined by (gx)h = % g -ih- ^ n particular, (gx) g = x e . The 
function x n- gx is continuous. A Borel probability measure p on ft is called stationary if 
p 9 = p for every g G G. 



6 



Let P and Q be stationary Borel probability measures on Q = E G . For any finite 
subset F of G and sequences x F = (x g ) g( zp and y F = (y g ) geF , define g F (x F ,Uf) = 
l^l" 1 E g er Q^a.Va)- Define g F (P, Q) by 

g F (P,Q) = inf E[ eF (X F ,F F )], (3.1) 

(X F ,Y 1? )~rFGM(P J r,Q J r) 

where Pp and Q f are marginal distributions of P and Q, respectively. The natural exten- 
sion of the g distance is defined by 

g(P,Q)= sup g F (P,Q), (3.2) 

FcG 

where the supremum is taken over all finite subsets F of G. We also define the stationary 
coupling distance g s 

g s (P,Q)= inf E[g(X e ,Y e )}, (3.3) 

(x,Y)~reM„(p,Q) 

wh ere M S (P, Q) is th e set of jointly stationary measures with marginals P and Q. 



Gray et alj d 19751) showed that g = g s if G = Z (see dl.5l )). We will prove this equal- 



ity for general countable groups G under a weak kind of amenability assumption. In this 
section, we denote T[g] = E[g{X e , Y e )] and T[g F ] = E[g F {X F , Yp)} for T e M{P, Q). 



Lemma 3.1 g(P,Q) < g s (P,Q). 

Proof: Fix an arbitrary e > 0. Take a jointly stationary measure T £ M S (P, Q) such that 
T[g] < Q S (P, Q) + £■ Then g F (P, Q) < T[g F ] =T[g}< g s (P, Q) + e. Since F and e are 
arbitrary, we obtain g(P, Q) < g s (P, Q)- □ 

We need a technical lemma. 

Lemma 3.2 Let G be countable and F C G be finite. Then 

g F (P,Q)= inf T[g F \. 
reM(p,Q) 

Proof: It is sufficient to prove existence ofT £ M(P,Q) for any Yp £ M(P F ,Q F ). This 
follows from the general extension property of probability measures with given marginals. 

□ 



To establish the equality g = g s , we put an additional amenability assumption on G. 
The pro of of the following representation theorem follows the lines of the proof of Theo- 
rem 1 of iGrav et al.l ( 11975b . 



Theorem 3.3 Let G be a countable group. Assume that there exists a sequence {F n } n >o 
of finite subsets of G such that linin^oo \F n PI (hF n )\/\F n \ = lfor any h £ G. Then 

g(PQ) = g s {P,Q)- 

Proof: Fix e > 0. For each n > 0, choose a measure T n £ M(P, Q) such that r„[g Fii ] < 
gp n + e (see Lemma [3~2l ). Define measures f n by 

T n (A) = r^iY, r «(^)- 
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Note that T n [g] = T n [gp n ]. The first marginal measure of T n is 

f n (Ai x n) = JL ^ x n)) = X] p (^i) = 

since P is stationary. Similarly, the second marginal measure of f n is Q. Hence f „ £ 
M(P, Q). Since P and Q are tight measures, the sequence {f „}„>o is tight and therefore 
has a subsequence converging weakly. We assume without loss of generality that {f „}„>o 
itself converges weakly to a measure f . Then f £ M(P, Q). Furthermore, f is stationary, 
i.e. f £ M S (P, Q). Indeed, for any h £ G and measurable A C ft 2 , we have 



t n (hA) = JL ]T T n (ghA) 



g£F„ 



= \h E r„G^) + o(i) 

1 g£F n n(hF n ) 

= f n (A) + o(l), 

where we used lim JWOO \F n n ( ft-i^i ) | / 1 i^n | = 1. This implies stationarity off. Finally, 
Qs < r[g] < lim sup r„ [g] = limsupr n [gi? n ] < limsup g~F„ + e < g + e. 

Since e is arbitrary, we have g a < g. □ 



Remark 3.4 1. For the example G = Z , we can taA:e F„ = {— n, . . . , n} d . On the 
other hand, if G is the free group generated by two elements f\,fi 7^ e, then there 
does not exist a sequence {F n } satisfying the amenability condition because the 
neighboring set (fiF n U f 2 F n U fi 1 F n U f^Fn) \ F n has at least 2\F„\ + 2 
elements. 

2. The above given proof extends directly to the case of compact groups where T n is 
defined via integration w.r.t. the normalized Haar measure. An extension of the rep- 
resentation result to general amenable groups on product spaces seems possible, but 
there are still some technical problems. Instead we will give an extension to amenable 
transformation groups acting on Polish function spaces. 

Let (G, Q) be a group of measurable transformations acting on a Polish space {B, g) 
of real functions on E and let P, Q be stationary probability measures on B, i.e. P 9 = P, 
Q 9 = Q, Vg £ G. We assume that G is an amenable group, i.e. there exists a sequence A n 
of asymptotically left invariant probability measures on G such that 

X n (gA) - X n (A) -> 0, VA£G. (3.4) 

The hypothesis of amenability is central for example in the theory of invariant tests. Many 
of the standard transformation groups are amenable. A typical exception is the free group 
of two generators. The Ornstein distance can be extended to this class of stationary random 
fields as follows. Define the average distance w.r.t. A„ by 

Qn{x,y) := J g(gx,gy)X n (dg). (3.5) 

The induced minimal probability metric is given by 

g n (P, Q) = M{Eg n (X, Y) \(X,Y)~Te M(P, Q)}. (3.6) 



Finally, the natural extension of the g metric of lGrav et al. I d 19751) is defined as 

g(P,Q) = sup g n (P,Q). (3.7) 

n 

Remark 3.5 In the particular case when G is countable and X n = jp—r X) g eF £ s / or 
some increasing class of finite sets F n C G we can take the product space B = E and 
we obtain g n (x, y) = j±j J2 9 eF n Q(9 x g,9 Y g) and 

g n (P,Q) = M{Eg n (X Fn , Y Fn \ (X Fn , Y Fn ) ~ T Fn G M(P Fn ,Q Fn )} (3.8) 

vv/f/i Xp n = {gX) g ^p n =: 7Tir n (X), Yf„ = {gY) g ^p n = Tp,,^)- T/iMi g„ depends 
only on the finite dimensional projections P Fn — P ir - P », Q Fn = Qf Fn of P, Q and we 
include the previous framework. Amenability of G corresponds to the condition that F n is 
asymptotically left invariant in the sense that 

\F n r\{hF n )\/\F n \^l 1 V/ieG, (3.9) 

i.e. to the condition in Theorem \3.3\ 

The optimal stationary coupling problem is introduced similarly as in Section|2]by 

g s (P,Q) = M{E[g(eX,eY)} \(X,Y)~Te M S {P,Q)} (3.10) 

where M S (P,Q) = {T e M 1 (B x B) \ T&a) = r, V.g G G} is the class of jointly 
stationary measures with marginals P, Q and e is the neutral element of G. We use the 
notation T(g) = E[g{eX, eY)} and T n (g) = E[g n (X, Y)] for T G M(P, Q). 

We now can state an extension of the Gray-Neuhoff-Shields representation result for 
the g distance of stationary random fields to amenable groups. 

Theorem 3.6 (General representation result for q distance) Let G be an amenable 
group acting on a Polish function space B on E, let P, Q be stationary integrable probabil- 
ity measures on B, i.e. for X = P, Eg(X, y) < oo for y £ E. Then the extended Ornstein 
distance g defined in ( 13. 7t coincides with the optimal stationary coupling distance g s , 

q(p,Q) = q s (p,Q)- 

In particular, g does not depend on choice of A„. 

Proof: To prove that g(P, Q) < g s {P, Q) let for e > given Y G M S (P, Q) be such that 
r(f?) < Q S (P, Q)+£- Then using the integrability assumption and stationary of Y we obtain 
for all n G N 

Qn(P,Q) < T n (g)=E J g(gX,gY)X n (dg) 

= J Eg(gX,gY)X n (dg)=Y(g) < g s (P 1 Q)+e. 

This implies that g(P, Q) < g s {P,Q). 

For the converse direction we choose for fixed e > and n > an element r„ G 
M(P, Q) such that Y n (g) < g n (P, Q) + £■ We define probability measures {?„} by 

Y n {A) := f Y n (gA)d\ n (g). (3.11) 

JG 

Then using the integrability condition and amenability of G we obtain that 

Y n {gA) - Y n (A) = I (Y n {gA) - Y n (A))X n (dg) ->■ 0, (3.12) 

JG 
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i.e. r„ is asymptotically left invariant on B x B. 

By definition T n £ M(P, Q), just take projections on finite components of T n 

f n (Ai x n) = / r n ( 5 A! x Q)X n (dg) 
Jg 

= f P(gA 1 )X n (dg) = P(A 1 ) 

J G 

since P is stationary. Using tightness of {f „} we get a weakly converging subsequence 
of {f„}. W.l.g. we assume that {f „} converges weakly to some probability measure f on 
B x B. In consequence by (I3.12l i we get f £ M S (P, Q). Finally, 

9s(P,Q) < T(q) <limsupf„(e) 

< limsup g n (P, Q) + e < g(P, Q) + e 

for all e > which concludes the proof. □ 

Motivated by the representation results in Theorem |3.3ll3.6| we now consider the opti- 
mal stationary coupling problem for general groups G acting on E = K. Let F be a finite 
subset of G and let / : R F -> E be a convex function. The function / is naturally identified 
with a function on £1 by f(x) = f((x g ) g ^F)- As in Section|2]any choice of the subgra- 
dient of / is denoted by ((d g f)(x)) g& F- Define an equivariant Borel measurable function 
5 : fl — >• Q by the shifted sum of gradients 

S e (x) = ^2{d g f)(gx) and S h (x) = Sdh^x)^ £ G. (3.13) 

Note that S e (x) depends only on (x g ) g ^G(F)> where G(F) is the subgroup generated by F 
in G. We have S o g = g o S for any g £ G because 

Sh{gx) = S e (h~ 1 gx) = S g -i h (x) = (gS(x)) h . 

Hence if X is a stationary random field, then (X, S(X)) is a jointly stationary random 
field. 

We obtain the following theorem. 



Theorem 3.7 Let P, Q be stationary random field probability measures with respect to a 
general group of measurable transformations G. Let S be an equivariant map as defined 
in ( 13.131 ) with a convex function f. Let X be a real stationary random field with law p, and 
assume that X e and (d g f(X)) ge p are in L 2 (p). Then (X, S(X)) is an optimal stationary 
coupling w.r.t. squared distance between [i and [i s , i.e. 

E{X e -S e (X)f = min E[(X e -Y e f}. 

Proof: The construction of the equivariant mapping in ( 13.13b and the following remark 
allow us to transfer the proof of Theorem 12.41 to the class of random field models. Fix 
r £ M s (/i, ii s ). Let G(F) be the subgroup generated by F in G. Then G(F) is countable 
(or finite). We denote the restricted measure of \i on by ^Ig(F)- By tne gluing lemma, 

we can consider a jointly stationary random field (X g , Y g , X g ) g& Q^ on a common proba- 
bility space such that (X g ) g£G(F) - h\g(f), Yg = S g (X) and (X g ,Y g ) g£G(F) - r| G(F) . 
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Then we have 

^E[{X e - S e (X)) 2 - (X e ~ S e (X)) 2 } = E[S e (X)(X e - X e )) 

= J2 E [(( d sf^X))(X e -X e )} 

geF 

= *£E[{(d g f)(X))(X g -X g )] 
g eF 

< E[f(X)-f(X)} 
= 0. 

This implies that (X, S(X)) is an optimal stationary coupling w.r.t. squared distance be- 
tween the random fields p, and p s = S#fi. □ 



4 Optimal stationary couplings for general cost functions 



The Monge-Kantorovich problem and the related char acterization of optimal co uplings 
have been gene ralized to general cost functions c(x, y) in lRiischendori ( 1991 , 1995), while 



McCann (200 1[) ex t ended the sq uared loss case to manifolds; see also the surveys in lRachev and Riischendorf 

(11998b and lVillanil (120031 120091) . Based on these developments we will extend the optimal 

stationary coupling results in Sections |2][3] to more general classes of distance functions. 

Some of the relevant notions from transportation theory are collected in the Appendix |B"1 

We will restrict to the case of time parameter Z. As in Section [3] an extension to random 

fields with general time parameter is straightforward. 

Let Ei, E2 be Polish spaces, and let c : Ei x _E 2 — > K be a measurable cost function. 
For / : £1 -> 1 and x G E x let 

d c f(x ) = {2/0 G E 2 I c(x ,y ) - f(xo) = inf {c(z ,y ) - f(z )}} (4.1) 

denote the set of c-supergradients of / in xq. 

A function ip : E x — > R U {—00} is called c-concave if there exists a function 
$:£ 2 ^1U {-00} such that 



ip(x) = inf (c(x, y) - *(y)), Vx G E x . 



(4.2) 



If (p(x) = c(x, yo) — ^(yo)> then y G d c <p{x) is a c-supergradient of ip at x. For squared 
distance c(x,y) = \\x — y||| in R™ = E\ = E-x c-concavity of <p is equivalent to the 
concavity of <p — \x\^j1. 

The characterization of optimal couplings T(x) G d c p{x) for some c-concave function 
p leads for regular ^toa differential characterization of c-optimal coupling functions T 



V x c(x,T(x))=V<p(x) 



(4.3) 



see 



Ruschendod dl99ll) . IVillanil d2009l) . In case (14.3b has a unique solution in T{x) this 



equation describes optimal c-coupling functions T in terms of differentials of c-concave 
functions tp and the set of c-supergradients d c ip(x) reduces to just one element 



d°p{x) ={x - V x c*(x, ip{x))}. 



(4.4) 



Here c* is the Lege ndre transform of c an d V x c(x, ■) is invertible and (V^c) 1 (i , <p(x)) 
X7 x c*(x,p>(x)) (see iRiischendorfj dl99lh : iRachev and Riischendorl dl998h and lVillani 
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d2003[ 120091) ). For functions ip which are not c-concave, the supergradient d c (p(x) may be 
empty. 

The construction of optimal stationary c-couplings of stationary processes can be pur- 
sued in the following way. Define the average distance per component c„ : E™ x E^ — » ffi 
by 



_^ Tl — J, 
71 ' ^ 



(4.5) 



I, there exists a function F n : E™ 



E£ 



and assume that for some function / : E™ 
such that 

F n (x) = (F k (x)) < k < n - 1 £d c «f(x), x£E?. (4.6) 

Note that ( 14.6b needs to be satisfied only on the support of (the projection of) the stationary 
measure \i. In general we can expect <9 c "/(x) ^ 0, Vx € E 1 ™ only if / is c„-concave. For 
fixed y , . . . , y„_i € £2 we introduce the function h c (x ) = \ Yl=l c(x ,y k ), x e E 1 . 
h c (x) describes the average distance of xo to the n points yo, . . ■ , y n -i in E^. We define 
an equivariant map S : Ef —> E% by 



S (x) G d C (h c (x )) | yfc= F fc (>_ fc ,...,:E_ fc + „_ 1 ),0<fc<ri-l 

S t {x) = 5 (L~*x), S(x) = (S t {x))tez- 



(4.7) 



Here the c-supergradient is taken for the function h c (xo) and the formula is evaluated at 
Vk = Fk{ x -ki ■ ■ ■ , £-fc+n-i)> < k < n — 1. After these preparations we can state the 
following theorem. 

Theorem 4.1 (Optimal stationary c-couplings of stationary processes) Let X = 

(-Xi)tgz be a stationary process with values in E\ and with distribution (i, let c : E\ x 
E2 — > M be a measurable distance function on E\ x Ei and let f : E™ — > K be measur- 
able c n -concave. If S is the equivariant map induced by f in ( 14.7b and if c(Xq, Sq(X)), 
{c(X k , Fk(X n ))} k l ~Q and f(X n ) are integrable, then (X, S(X)) is an optimal stationary 
c-coupling of the stationary measures [i, fj, i.e. 

Ec(X ,So(X)) = mf{Ec(Yo,Z ) \(Y,Z)~T€ M s (^fi s )} = c s (^ 5 ). (4.8) 

Proof: The construction of the equivariant function in (14.7b allows us to extend the ba- 
sic idea of the proof of Theorem 12.31 to the case of general cost function. Fix any T G 
M s (n, \i ). By the gluing lemma, we can consider a jointly stationary process (X, Y, X) 
on a common probability space with properties X ~ ^, Y = S(X) and (X,Y) ~ r. Then 
we have by construction in (14.7b and using stationarity of X 

E[c(X ,S (X))-c(X ,S (X))] 

Tl-1 



< E 



E 



ji- 1 ^ {c(X ,y k )-c(X ,y k )} 
{c(X k ,y k ) - c(X k ,y k )} 



fc=0 

Tl-1 

fe=0 



y k =F k (X_ k , 



= E 



Vk=Fk(X , 

c n (X n ,F n (X n ))-c n (X n ,F n (X n ))~ 

< E[f(X n )-f(X n )} 
= 0. 

The last inequality follows from c n -concavity of / while the last equality is a consequence 
of the assumption that X = X. As consequence we obtain that (X, S(X)) is an optimal 
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stationary c-coupling. □ 

The conditions in the construction ( 14.71 ) of optimal stationary couplings in Theorem 
14. II (conditions ( 14.61 ). d4.71 i) simplify essentially in the case n = 1. In this case we get as 
corollary of Theorem l4. 1 1 



Corollary 4.2 Let X = (X t )t<£U be a stationary process with values in E\ and distribution 
/i and let c : E\ x E2 — > R be a cost function as in Theorem 14. 71 Let f : E\ — > R be 
measurable c-concave and define 

S (x) e d c f(x ), S t (x) = So(L- t x)ed c f(x t ), S(x) = (5 t (x)) t6Z . (4.9) 

Then (X, S(X)) is an optimal stationary c-coupling of the stationary measures [i, [i s . 

Thus the equivariant componentwise transformation of a stationary process by super- 
gradients of a c-concave function is an optimal stationary coupling. In particu lar in the case 
that E i = R fc several examples of c-optim al transformations are given in iRiischendori 
(119951) resp. lRachev and Riischendorfld 19981) which can be used to apply Corollary 14.21 

In case n > 1 conditions ( 14.6b . ( 14.7b are in general not obvious. In some cases c„- 
convexity of a function / : i?" — > R is however easy to see. 

Lemma 4.3 Let f(x) = J2k=o fk( x k), fk ■ E\ — > R, < k < n — 1. If f k are c-concave, 
< k < n — 1, then f is c n -concave and 

n-l 
fe=0 

Proof: Let y k e d c f k (x k ), < k < n - 1, then with y = (y k )o<k<n-i by definition of 
c-supergradients 

Cn(x,y) - f(x) = - ^2{c(x k ,y k ) - f k (x k )) = mi{c n (z,y) - f(z);z e E[ 1 } 

k 

and thus y € d Cn f(x). The converse inclusion is obvious. □ 



Lemma 1431 allows to construct some examples of functions F n satisfying condition 
(14.51 1. For n > 1 non-emptiness of the c-supergradient of h c (xo) = — J^kZo c ( x cb Uk) has 
to be established. The condition u e d c h c (xo) is equivalent to 

c(x ,u ) - /i c (a; ) = inf (c(jz, «o) - ^c(^))- (4.11) 

In the differentiable case d4.1U implies the necessary condition 



V x c(a;o, w ) = V x /i c (x ) = - V* V k c(:eo, yk). (4.12) 

n ^— ' 

If the map u — > W x c(x , u) is invertible then equation ( 14.12l i implies 

u = (V x c) _1 (x , •) 



^J2V x c(x ,y k )j (4.13) 



(see ( 14.41 )). Thus in case that ( 14.111 ) has a solution, it is given by ( 14.131 ). 
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Lemma 4.4 If ( 14.111 ) has a solution and u — > \7 x c(xq, u) is invertible, then for xq G £1 
uo = {^ x cy 1 (xo, •) Efc=o Va;C(xo,yfc)) is a super gradient of h c in x , 



u G d c /i c (x ). (4.14) 

Example 4.5 If c(x,y) = H{x — y) for a strict convex function H, then \7 x (c(x, •) 
/s invertible and we can construct the necessary c-supergradients of h c . If for example 
c(x,y) = \\x — y\\ 2 , then we get for any xq G R fe , 



n-1 

1 

Uq = U ( 



)(so) = -T^Vk =V (4.15) 

fc=0 

is independent of xq and 

yed c h c (x ), Vi eK fc . (4.16) 
Ifc(x,y) — \\x — y\\ p , p > 1, f/ien we get for xq G M. k 

uq = u (x ) = x + |/i(x )r _1 777^77, (4.17) 

\h{x )\ 

where hjxp)^ = ^ 7^ 1?— n ||^o — j/fcp -1 [if^fjT " ^ or f ' 1is related further examples see 



RuschendorM1993is . 



The c-concavity of h c has a geometrical interpretation, uq E d c h c (xo) if the difference 
of the distance of zq in E\ to uq in £2 and the average distance of Zo to the given points 
Do, . . . , yn-i m E 2 is minimized in x . The c-concavity of h c can be interpreted as a posi- 
tive curvature condition for the distance c. To handle this condition we introduce the notion 
of convex stability. 

Definition 4.6 The cost function c is called convex stable of index n > 1 if for any y G -E^' 

^ n— 1 

h c (%o) = — / c(xo,j/fc), xo € i?i, is c-concave. (4.18) 
n •^-^ 

c is called convex stable if it is convex stable of index nfor all n > 1. 

Example 4.7 Le/ £1 = _E 2 = H be a Hilbert space, as for example H = R m , let c(x, y) = 
||.t — y|| 2 /2 andfix y G H n , then 



h c (x ) = c(x ,y k ) 

k=0 

n— 1 

= c(x ,y) + -Y j c(y,y k ), (4.19) 

where y = i X)fe=o 2/ fc TTiws fey definition i4.2i h c is c-concave and a c-supergradient of 
h c is given by y independent of xq, i.e. 

y G 9 c /i c (x ), Vx G ff. (4.20) 

Thus the squared distance c is convex stable. 
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The property of a cost function to be convex stable is closely connected with the geo- 
metric property of non-negative cross curvature. Let E\ and E 2 be open connected subsets 
in W n (m > 1) with coordinates x = (x % )^L 1 and y = (y : >)J!L 1 . Let c : E\ x E 2 — >• 
K be C 2 ' 2 , i.e. c is two times differentiable in each variable. Denote the cross deriva- 
tives by dj t k = d 3 c/dx l dx^dy k and so on. Define c x (x,y) = (dc/dx l ) n l 1 , c y (x,y) = 
(dc/dyi)]L lt U = {c x (x,y) \ y G E 2 } c R m , V = {c v {x,y) \ x e E x ] c R m . Assume 
the following two conditions. 

[Bl] The map c x (x, ■) : E 2 — > U and c y (-, y) : E\ — »• V are diffeomorphic, i.e., they are 
injective and the matrix (cij(x, y)) is positive definite everywhere. 

[B2] The sets U and V are convex. 

The conditions [Bl] and [B2] are called bi-twist and bi-convex conditions, respectively. 
Now we define the cross curvature a(x, y; u, v) in x G E\, y G E 2 , u G W 11 and v G M. m 
by 

a(x,y;u,v) := ^ -Cy,fe/ + ^ c ij . q c v ^ q c vM u r v?v k v l (4.21) 

where (c JJ ) denotes the inverse matri x of (cj.j). 

The following result is given by Kim and McCannl ( 2008 ). Note that these authours 



use the terminology time-convex sliding-mountain instead of the notion convex-stability as 
used in this paper. 

Proposition 4.8 Assume the conditions [Bl ] and [B2 ]. Then c is convex stable if and only 
if the cross curvature is nonnegative, i.e., 

a(x, y; u, v) > 0, Vx, y, u, v. (4.22) 



The cross-curvature is related to the Ma-Trudinger-Wang tensor (IMa et al.l (120051) 1 



which is the restriction of a(x, y; u, v) to u l v^a.i = 0. Known examples that have non- 



negati ve cross-curvature are the n-sphere (|K im and McCann (20081). iFigalli and Rifford 



d2009l) ). its perturbation dDelanoe and Gel j2010hjFigalli et al.l d2010bl) ). their tensorial prod 



uct and their Riemannian submersion. 

If Ex,E 2 C K, then the conditions fB 1 ] and [B2] are implied from a single condition 
in case c xy = d 2 c(x, y)/dxdy 7^ 0. Hence we have the following result as a corollary. A 
selfcontained simplified proof of this result is given in AppendixICl 

Proposition 4.9 Let E\,E 2 be open intervals in K and let c G C 2,2 , c : E\ x E 2 — > M. 
Assume that c x . y 7^ Oforall x,y. Then c is convex stable if and only if a (x,y) := — c XXiyy + 

Cxx,yCx,yy / Cx,y ^ 0- 

Example 4.10 Let E\, E 2 aRbe open intervals and let E\C\E 2 = 0. Consider c(x, y) = 
i \x — y\ p withp > 2orp < 1. Then c is convex stable. In fact c x . y = — (p— l)\x — y\ p ~ 2 7^ 
Ofor all x,y and a(x,y) = (p — 1) (p — 2) \x — y \ p ~ 4 > for all x, y. As p — > 0, we also 
have a convex stable cost c(x, y) = log \x — y\. 

If the cost function c is a metric then the optimal coupling in the case E\ = E 2 = K 
can be reduced to the case of E\ n E 2 = as in the classical Kantorovich-Rubinstein 
theorem. This is done by subtracting (and renormalizing) from the marginals \xq, vq the 
lattice infimum, i.e. defining 

Mo : = -(Mo-MoAj/o), v'q ■= ^Xyo - Ho A v Q ). (4.23) 

The new probability measures live on disjoint subsets to which the previous proposition can 
be applied. 
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Some classes of op t imal c -couplings for various distance functions c have been dis- 
cussed in iRiischendorn d 19951) . see also iRachev and Ruschendorj dl998l) . The examples 



discussed in these papers can be used to establish c„-concavity of / in some cases. This is 
an assumption used in Theorem l4. 1 I for the construction of the optimal stationary couplings. 
N ote that c n is convex -stable if c is convex-s table. Therefore the following proposition due 
to lFigalli et al.l(l2010ah ^partially I S eil (120 1 Pel) ) is also useful to construct a c„-concave func- 
tion /. 

Proposition 4.11 Assume [Bl ] and [B2 ]. Then c satisfies the non-negative cross curvature 
condition if and only if the space of c-concave functions is convex, that is, (1 — A)/ + Xg 
is c-concave as long as f and g are c-concave and X € [0, 1]. 

Example 4.12 Consider Example \4. 1 0\ again. Let Ei = (0, 1), E 2 = (— oo, 0), c(x±, yi) = 
P" 1 ( a; i-yi) p (P > 2)andc n (x,y) = (up)- 1 YXZo{ x k~yk) p - An example of c n -concave 
functions of the form f(x) = X^fc=o fk(xk) with suitable real functions is given in 
RiischendorMl993il Example 1 (b). We add a further example here. Put x = n^ 1 X)fc=o Xk 



and let f(x) = A(x) with a real function A. We prove f(x) is c n -concave if A' > 1 and 
A" < 0. For example, = ^ + satisfies this condition. Equation \4. 3D becomes 



n 



-l 



(xi - ViY' 1 = n- l A'(x) (4.24) 



which uniquely determines yi £ E 2 since A' > 1 andxi £ E\. To prove c n -concavity of f, 
it is sufficient to show convexity of x i— > c n {x, y) — f(x) for each y. Indeed, the Hessian is 

dijn^ip-l^Xi - yi f~ 2 ~ n- 2 A"(x) h -n- 2 A"{x) > 

in matrix sense. Note that the set of functions A satisfying A' > 1 and A" < is con- 
vex, which is consistent with Proposition \4.11\ Therefore, any convex combination of A(x) 
and the c n -concave function ^2 k fk(xk) discussed above is also c n -concave by Proposi- 
tion \4.11\ 



Appendix 

A Gluing lemma for stationary measures 

The gluing lemma is a well known construction of joint distributions. We repeat this con- 
struction in order to derive an extension to the gluing of jointly stationary processes. For 
given probability measures P and Q on some measurable spaces E\ and E 2 , we denote the 
set of joint probability measures on E\ x E 2 with marginals P and Q by M(P, Q). 

Lemma A.l (Gluing lemma) Let P\, P 2 , P3 be Borel probability measures on Polish 
spaces Ei, E 2 , E3, respectively. Let Pi 2 G M(Pi, P 2 ) and P23 £ M(P 2 , P3). Then there 
exists a probability measure P123 on Ei x E 2 x E3 with marginals Pi 2 on Ei x E 2 and 
P23 on E 2 x E 3 . 

Proof: Let Pi\ 2 (• | • ) be the regular conditional probability measure such that 

Pi 2 (A 1 xA 2 )= [ Pi l2 {Ai\x)P 2 (dx) 
Ja 2 

and P312 (-|-) be the regular conditional probability measure such that 

P 32 (A 3 xA 2 )= [ P 3l2 (A 3 \x)P 2 (dx). 
Ja 2 
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Then a measure Pi 23 uniquely defined by 

Pi2 3 (A x A 2 x A3) := / Pi2(A 1 \x)P 32 (A 3 \x)P 2 (dx) 
Ja 2 

satisfies the required condition. □ 

Next we consider an extension of the gluing lemma to stationary processes. We note 
that even if a measure P123 on Ef x Pf x Pf has stationary marginals P\ 2 on Ef x E 2 
and P23 on Ef x Ef, it is not necessarily true that P is stationary. For example, consider 
the { — 1, l}-valued fair coin processes X = (Xt)tez and Y = (Yt)t&z independently, 
and let Z t = (— VfX-tYf Then (X,Y) and (Y, Z) have stationary marginal distributions 
respectively, but (X, Y, Z) is not jointly stationary because X t Y t Z t = (— 1)*. 

For given stationary measures P and Q on some product spaces, let M S (P, Q) be the 
jointly stationary measures with marginal distributions P and Q on the corresponding prod- 
uct spaces. 



Lemma A.2 Let E\, E 2 , P3 be Polish spaces. Let P\,P 2l P3 be stationary measures on 
Ef, Pf , Ef, respectively. Let P V2 G M s (Pi, P 2 ) and P 23 € Af s (P 2 , P 3 ). Then there exists 
a jointly stationary measure P123 on Ef x Pf x Pf w/f/1 marginals P\ 2 and P23. 

Proof: One can apply the same construction as in the preceding lemma. □ 



B c-concave function 

We rev i ew some basic result s on c-concavity. See Riischendorfl l 1991 , 1995 ): Rachev and Riischendorf 
d 1 998b : I ViUanl d2003l E009I) for details. 

Let Pi and P2 be two Polish spaces and c:E\ x P2 — > R be a measurable function. 

Definition B.l We define the c-transforms of functions f on E\ and g on E 2 by 

f c (y) ■= inf {c(x,y) - f(x)} and g c (x) := inf {c(x,y) - g(y)}. 

xeE yeE 2 

A function f on E 2 is called c-concave if there exists some function g on E 2 such that 

f{x) = g c (x). 

In general, f cc > f holds. Indeed, for any x and y, we have c(x, y) — f c (y) > f(x). 
Then f cc (x) = M y {c(x, y) ~ f c (y)} > f(x). 

Lemma B.2 Let f be a function of E\. Then f is c-concave if and only if f cc = f. 

Proof: The "if" part is obvious. We prove the "only if" part. Assume / = g c . Then f c = 
g cc > g, and therefore 

r(x) - M{c(x, y) - f c (y)} < M{c(x, y) - g(y)} = g c (x) = f(x). 
y y 



Since f cc > f always holds, we have f cc = f. 

Define the c-supergradient of any function / : Pi — > R by 

d c f(x) = {yeE 2 \ c(x, y) - f(x) = f c (y)} . 
Lemma B.3 Assume that d c f(x) 7^ for any x £ E\. Then f is c-concave. 



□ 
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Proof: Fix x € E% and let y G d c f(x). Then we have 

f(x) = c(x,y)-f c (y)>r c (x)>f(x). 
Hence f cc = / and thus / is c-concave. □ 



The converse of Lemma IB. 31 does not hold in general. For example, consider E\ = 
[0, oo), E% — R and c(x, y) = —xy. Then c-concavity is equivalent to usual concavity. 
The function f(x) = y/x is concave but the supergradient at x = is empty. 



C Proof of Proposition 149 



Consider the cost function c(x,y) on E\ x E<z with the assumptions in Proposition 14.91 
Since c x>y ^ 0, the map y h-> c x (x, y) is injective. Denote its image and inverse function 
by U = {c x (x,y) | y G £"2} and r/ x = (c x (x, : f7 i-> £2, respectively. Hence 
c x (x,r] x (u)) = u for all u € {/ and rj x (c x (x,y)) — y for all y £ £2- Note that £/ is 
an interval and therefore convex. Also note that the subscript x of r) x does not mean the 
derivative. By symmetry, we can define V = {c y (x, y) \ x £ Ei} and £ y = (c y (-, : 
V 1 — ^ Si. 

We first characterize the c-gradient of a differentiable c-concave function /. Let x £ 

.Ei and y £ d c f(x). Then c(x, y) — /(x) < c(z,y) — f(z) for any z £ Si. By the 
tangent condition at z = x, we have c x (x, y) — f'(x) — 0, or equivalently, y = f] x (f'(x)). 
Hence we have d c f(x) = {r) x {f' '(x))}. We denote the unique element also by d c f(x) = 
%(/'(*))■ 

To prove Proposition |4.9l it is sufficient to show that the following conditions are equiv- 
alent: 

(i) c is convex stable for any index n 

(ii) The map u H> c(x, r] x (u)) — c(z, rj x {u)) is convex for all x, z £ E\. 

(iil) CxX>yy ~i~ C XX yC X yy I C X y ^ 0. 

We first prove (i) (ii). Assume (i). Let Q be the set of rational numbers. By the 
definition of convex stability, for any uq, u\ £ U and A £ [0, 1] n Q, the function 

4>{x) := (1 - X)c(x, rj x (u a )) + \c{x,r) x {ui)) 

is c-concave. The c-gradient of <fi is given by 

d c cj)(x) = r/ x ((l - X)c x (x,r) x (u )) + \c x (x,r) x (ui))) = r) x ((l - X)u + Xux). 

Then c-concavity, c(x, d c (f>(x)) — <fi(x) < c(z, d c <fr(x)) — 4>{z) for any z, is equivalent to 

c(x, T] x ((l - X)u + Ami)) - c(z, r? x ((l - X)u + Xui)) 

< (1 - X){c(x,r) x (u )) - c(z,r] x (u ))} + X{c(x, ^(mi)) - c(z,r] x (ui))}. 

Since both hand side is continuous with respect to A, (ii) is obtained. The converse is 
similarly. 

Next we prove (ii) (iii). Assume (ii). Fix x,z € E\ and uq £ U . Let yo = t] x (uq) 
and therefore uo = c x (x, yo). Since u H> c(x, r] x (u)) — c(z, r] x (u)) is convex for any z, its 
second derivative at u = uq is non-negative: 

dl{c(x,rj x (u)) - c(z,r) x {u))}\ u=uo 

= { Cyy{x, yo ) - c TO (z,y )}(^ 1) (w )) 2 + {c y (x,y ) - Cy(z,y )}r]P (u ) 
>0. (C.l) 
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On the other hand, by differentiating the identity c x (x, r) x (u)) = u twice at u = uq, we 
have 

c x ,v V (x,yo)(nP(u )) 2 + c x , y (x,y )r] { x 2) (uo) = 0. 



(ifW) 2 >o. 



Combining the two relations, we have 

{c yy (x, y ) - c yy (z, y )} - Cx ' vv j- X ' V0 h c y (x, y ) - c y (z, y )} 

c x,y\X,yo) 

Since r)x (uo) = l/cx, y (x, yo) ^ 0, we obtain 

{c yy {x,yo) - c yy (z,y a )} - Cx ' vv j- X > M { Cy ( Xj y Q ) _ Cy ( Z; ya )} >o. 

c x ,y\x, yo) 

Now let vo = c y (x,yo) and v = c y (z,yo). Then x = £, Vo (vo) and z = £ yo (v) from the 
definition of £ y . We have 

{Cyy(Zy (v ),yo) ~ C yv (Cy (v), U0 )} ~ C -^^M {vq _ y) > Q (Q2) 

Cx,y(£y {Vo),yo) 

This means convexity of the map v M> —c yy (£ yo (v),yo). Hence its second derivative is 
non-negative. Therefore 

-Cxx,yv(z,yo)(Z$(v)) 2 -c x , yy {z, 2/0)4? ( w ) ^ °- 
On the other hand, by differentiating the identity Cj,(£j, (v),yo) = v twice, we have 

c xx>y (z,y )(^{v)) 2 + c Xty {z,y )^(v) = 0. 
Combining the two relations, we have 



-c xx , yy (z,y ) + 



,{z,yo) 



/ \ < .1.1/1/ 

c Xy y{z,y ) 

Since («) = l/c x , y (z, yo) ^ 0, we conclude 



(&J(v)) 2 >0. 



(C3) 



i \ , i sC xxy (z,yo) 

-c xx ,yy{z,y ) + c x , yy {z,y ) r- > 0. 

Cx,y{z, yo) 

Since z and yo(= t] x (uq)) are arbitrary, we obtain (iii). 

The proof of (iii) =>■ (ii) is just the converse. First, dC.3l ) follows from (iii). Since dC.3l ) 
is the second derivative of the left hand side of dC.2| ), the convexity condition flC.2| > fol- 
lows. The condition dC.2t is equivalent to dC.lt , and ( ICU is also equivalent to (ii). This 
completes the proof. 
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